Representation learning for clustering via building consensus
نویسندگان
چکیده
Abstract In this paper, we focus on unsupervised representation learning for clustering of images. Recent advances in deep and are based the idea that different views an input image (generated through data augmentation techniques) must be close space (exemplar consistency), and/or similar images have cluster assignments (population consistency). We define additional notion consistency, consensus consistency , which ensures representations learned to induce partitions variations space, algorithms or initializations a single algorithm. loss by executing seamlessly integrate all three consistencies (consensus, exemplar population) into end-to-end framework. The proposed algorithm, using (ConCURL), improves upon performance state-of-the-art methods four out five datasets. Furthermore, extend evaluation procedure reflect challenges encountered real-world tasks, such as maintaining cases with distribution shifts. also perform detailed ablation study deeper understanding code trained models available at https://github.com/JayanthRR/ConCURL_NCE .
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملSemi-supervised Clustering for Short Text via Deep Representation Learning
In this work, we propose a semi-supervised method for short text clustering, where we represent texts as distributed vectors with neural networks, and use a small amount of labeled data to specify our intention for clustering. We design a novel objective to combine the representation learning process and the kmeans clustering process together, and optimize the objective with both labeled data a...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملRepresentation Learning for Clustering: A Statistical Framework
We address the problem of communicating domain knowledge from a user to the designer of a clustering algorithm. We propose a protocol in which the user provides a clustering of a relatively small random sample of a data set. The algorithm designer then uses that sample to come up with a data representation under which kmeans clustering results in a clustering (of the full data set) that is alig...
متن کاملImproved consensus clustering via linear programming
We consider the problem of Consensus Clustering. Given a finite set of input clusterings over some data items, a consensus clustering is a partitioning of the items which matches as closely as possible the given input clusterings. The best exact approach to tackling this problem is by modelling it as a Boolean Integer Program (BIP). Unfortunately, the size of the BIP grows cubically in the numb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2022
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-022-06194-9